



# Flash Memory Summit 2011

#### **Session 302: Nonvolatile Design Challenges and Methodologies**

The Processor's role in maximizing performance and reducing energy consumption

**Neil Robinson** 

# **Tensilica At a Glance**

## 1 Billionth core shipped in Apr 2011, 2B end 2012



#### Market



- Application-specific processor solutions, currently in over 20 market areas
  - Storage, Audio, Baseband, Printers, Cameras, Network infrastructure/access & more
- 1 billionth core shipped in Apr 2011, expecting 2B in 2012



#### **Business – Semiconductor IP licensing**

- 180+ Licensees worldwide
- Licensed by 8 of the top 12 semiconductor manufacturers
- Over 30 SSD manufacturers are using Tensilica-based Controllers today



#### Technology

- Customers <u>generate</u> processors with selected options and custom instructions
  - Adding to a base instruction set, backwards compatible to 1999
- Software Dev. tools and Physical components created automatically
- Processor is automatically verified (800+ different processors in silicon today)

## **SSD Controller**

## **IOPS: Data Management + Computational Throughput**





# **Improving Data Management**

### **Conventional processors vs Xtensa processors**





# **Improving Data Management**

### **Xtensa processors provide system flexibility**





# **Ex: Improving Data Management**

### By reducing the number of cycles required to fetch data





# **Increasing Computational Throughput**

In Fixed Instruction-Set-Architecture (ISA) Processors





### Run the processor faster

- May not be possible
- May have a system/board design impact



## Optimize the SW

- If you have the time and the resources
- More maintenance if at assembly level



## Choose a faster processor

- Will increase power/energy consumption
- Will cost more (in area & perhaps licensing)

# **Increasing Computational Throughput**

### In Xtensa Processors – more choices





#### Add new instructions

Just to accelerate critical areas

Often over 10x performance improvement



### Improve register availability

- Add more registers for the compiler to use
- Set custom widths, up to 1024bits, to keep area as low as possible



#### **Execute multiple instructions at once**

- Add VLIW-style instructions without the code bloat
- A general purpose performance improvement



#### **Increase local Memory bandwidth**

- Second Load/Store unit
- Up to 512bits per cycle for each unit, 1024bits total per cycle

#### **DMA to local memory**

- For Command Queues/Code overlays etc.
- No processor cycles required to move data

ncer

## **Ex: Increasing Computational Throughput**

By reducing the number of cycles required to process data

Customers improve performance by more than 10x in real application code...



Copyright © 2011 Tensilica, Inc.



**Byteswap** 

## **Increasing Computational Throughput**

### **Looking at Energy and Area Efficiency**





## **Processor choice affects design flexibility**



Fix your design bottlenecks more efficiently with Xtensa





## Thank you!

7.6M

ancerestance